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ABSTRACT 

Image  quality  assessment  plays  a  major  role  in  many  image  processing  applications.  Although  much  effort  has  been 
made  in  recent  years  towards  the  development  of  quantitative  measures,  the  relevant  literature  does  not  include  many 
papers  that  have  produced  accomplished  results.  Ideally,  a  useful  measure  should  be  easy  to  compute,  independent  of 
viewing  distance,  and  able  to  quantify  all  types  of  image  distortions.  In  this  paper,  we  will  compare  three  full-reference 
full-color  image  quality  measures  (M-DFT,  M-DWT,  and  M-DCT).  Assume  the  size  of  a  given  image  is  nxn.  The 
transform  (DFT,  DWT,  or  DCT)  is  applied  to  the  luminance  layer  of  the  original  and  degraded  images.  The  transform 
coefficients  are  then  divided  into  four  bands,  and  the  following  operations  are  performed  for  each  band:  (a)  obtain  the 

magnitudes  M„,,  /— 1,...,  (nxn/ 4)  of  original  transform  coefficients,  (b)  obtain  the  magnitudes  Ms,  i—  1 . (nxn/ 4)  of 

degraded  transform  coefficients,  (c)  compute  the  absolute  value  of  the  differences:  i=  1,...,  (nxn/4),  and  (d) 

compute  the  standard  deviation  of  the  differences.  Finally,  the  mean  of  the  four  standard  deviations  is  obtained  to 
produce  a  single  value  representing  the  overall  quality  of  the  degraded  image.  In  our  experiments,  we  have  used  five 
degradation  types,  and  five  degradation  levels.  The  three  proposed  full-reference  measures  outperform  the  Peak- 
Signal-to-Noise  Ratio  (PSNR),  and  two  state-of-the-art  metrics  Q  and  MSSIM. 

Keywords:  image  quality,  quantitative  measures,  subjective  evaluation,  PSNR,  Q,  MSSIM,  DFT,  DWT,  and  DCT. 

1.  INTRODUCTION 

An  important  criterion  used  in  the  classification  of  image  quality  measures  is  the  type  of  information  needed  to  evaluate 
the  distortion  in  degraded  images.  Measures  that  require  both  the  original  image  and  the  distorted  image  are  called 
“full-reference”  or  “non-blind”  methods,  measures  that  do  not  require  the  original  image  are  called  no-reference  or 
“blind”  methods,  and  measures  that  require  both  the  distorted  image  and  partial  information  about  the  original  image  are 
called  “reduced-reference”  methods. 

Although  no-refererlce  measures  are  needed  in  some  applications  in  which  the  original  image  is  not  available,  they  can 
be  used  to  predict  only  a  small  number  of  distortion  types.  In  the  current  literature,  a  few  papers  attempt  to  predict 
JPEG  compression  artifacts  [1,2,3, 4],  and  others  blurring  and  JPEG  2000  artifacts  [5,6].  Reduced-reference  measures 
are  between  full-reference  and  no-reference  measures;  [7]  evaluates  the  quality  of  JPEG  and  JPEG2000  coded  images 
whereas  [8]  provides  assessment  for  JPEG  and  JPEG2000  compressed  images,  images  distorted  by  white  Gaussian 
noise,  Gaussian  blur,  and  the  transmission  errors  in  JPEG2000  bit  streams. 

The  applicability  of  full-reference  measures  is  much  wider.  They  can  be  used  to  estimate  a  spectrum  of  distortions  that 
range  from  blurriness  and  blockiness  to  several  types  of  noise.  Recent  examples  of  such  measures  are  given  in  Table  1 . 


Corresponding  author’s  email  address:  eskicioglu@sci.brooklyn.cuny.edu 


_ REPORT  DOCUMENTATION  PAGE _ 

Public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instn 
the  data  needed,  and  completing  and  reviewing  this  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  othei 
reducing  this  burden  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Hig 
anagement  and  Budget,  Paperwork  Reduction  Project  (0704-0188),  Washington,  DC  20503 _ _ 


AFRL-SR-AR-TR-07-0 1 46 


.AGENCY  USE  ONLY  (Leave 
ank) 


2.  REPORT  DATE 


4.  TITLE  AND  SUBTITLE 

Quality  Measures  Using 

Singular  Value  Decomposition 

6.  AUTHOR(S) 

Ahmet  Eskicioglu 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Research  Foundation  of  CUNY 

230  West  41s1  Street,  7th  Floor 

New  York,  New  York,  10036-7296 

3.  REPORT  TYPE  AND  Da  i  to  ouvtutu 
Final  / 12-1-2005 -11/30/2006 


S.  FUNDING  NUMBERS 

F  A9  550-05-1-0400 


8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 


10.  SPONSORING  /  MONITORING 
AGENCY  REPORT  NUMBER 


11.  SUPPLEMENTARY  NOTES 


a.  DISTRIBUTION  /  AVAILABILITY  STATEMENT 

Approve  for  public  release,  distribution  unlimited. 


12b.  DISTRIBUTION  CODE 


13.  ABSTRACT  (Maximum  200  Words) 

Image  quality  assessment  plays  a  major  role  in  many  image  processing  applications. 

Although  much  effort  has  been  made  in  recent  years  towards  the  development  of  quantitative 
measures,  the  relevant  literature  does  not  include  many  papers  that  have  produced 
accomplished  results.  Ideally,  a  useful  measure  should  be  easy  to  compute,  independent  of 
viewing  distance,  and  able  to  quantify  all  types  of  image  distortions.  In  this  paper,  we 
will  compare  three  full-reference  full-color  image  quality  measures  (M-DFT ,  M-DWT,  and  M- 
DDCT) .  Assume  the  size  of  a  given  image  is  nxn.  The  transform  (DFT,  DWT,  or  DCT) is 
applied  to  the  luminance  layer  of  the  original  and  degraded  images.  The  transform 
coefficients  are  then  divided  into  four  bands,  and  the  following  operations  are  performed 
for  each  band.  Finally,  the  mean  of  the  four  standard  deviations  is  obtained  to  produce  a 
single  value  representing  the  overall  quality  of  the  degraded  image. 


SECURITY  CLASSIFICATION 

18.  SECURITY  CLASSIFICATION 

OF  REPORT 

OF  THIS  PAGE 

OF  ABSTRACT 


NSN  7540-01-280-5500 


15.  NUMBER  OF  PAGES 


16.  PRICE  CODE 


20.  LIMITATION  OF  ABSTRACT 


Standard  Form  298  (Rev.  2-89) 

Prescribed  by  ANSI  Std.  Z39-1B 
298-102 


sMc'f- 


Table  1.  Full-reference  image  quality  measures 


Publication 

Domain  type 

Type  of  distortion  predicted 

Wang  and  Bovik  [11] 

Pixel 

Impulsive  salt-pepper  noise,  additive  Gaussian  noise, 
multiplicative  speckle  noise,  mean  shift,  contrast  stretching, 
blurring,  and  JPEG  compression 

Wang,  Bovik,  Sheikh  and 
Simoncelli  [12] 

Pixel 

JPEG  compression,  JPEG  2000  compression 

Van  der  Weken, 

Nachtegael  and  Kerre  [9] 

Pixel 

Salt  and  pepper  noise,  enlightening,  and  darkening 

Beghdadi  and  Pesquet- 
Popescu  [10] 

Discrete  Wavelet 
Transform  (DWT) 

Gaussian  noise,  grid  pattern,  JPEG  compression 

A  full-reference  paper  [11]  presents  a  new  numerical  measure  for  gray  scale  images,  called  the  Universal  Image  Quality 
Index,  Q,  which  is  defined  as 

toxyMxMy 
{a\  +  a2y){nl  +  n2y)] 

wher exj,yj,i  =  1 represent  the  original  and  distorted  signals,  respectively,  ytx  =  —  Z"=|*/  ,  My  =  “D”=i J'/  > 
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The  dynamic  range  of  Q  is  [-1,1],  with  the  best  value  achieved  when  y,  —  x„  i  =  1,2 This  index  models  any 
distortion  as  a  combination  of  three  different  factors  -  loss  of  correlation,  mean  distortion,  and  variance  distortion: 
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It  is  applied  to  512x512,  8  bits/pixel  Lena  using  a  sliding  window  approach  with  a  window  size  of  8x8.  The  index  is 
computed  for  each  window,  leading  to  a  quality  map  of  the  image.  The  overall  quality  index  is  the  average  of  all  the  Q 
values  in  the  quality  map: 


1  M 

Q  =  —  X  Q  ,■ ,  M  =  total  number  of  windows. 
M  y=l  J 


Q  produces  unstable  results  when  either  (yi2  +  fXy ) or  (ax  +  cry)  is  very  close  to  zero.  To  avoid  this  problem,  the 
measure  has  been  generalized  to  the  Structural  Similarity  Index  (SSIM)  [12]: 

(2 MxMy  +  )(2 o xy  +  C2 ) 

SSIM  = - — - - - 

(Mx  +My  +C,)(ax  +(Ty+C2) 

Q  is  a  special  case  of  SSIM  that  can  be  derived  by  setting  C\  and  C2  to  0.  The  performance  of  SSIM  has  been  tested 
using  a  database  of  JPEG  and  JPEG  2000  compressed  color  images.  In  the  experiments,  only  the  luminance  component 
of  each  compressed  image  is  used.  The  authors  argue  that  the  use  of  color  components  does  not  significantly  change  the 
performance  of  the  model,  even  though  they  acknowledge  the  fact  that  this  may  not  be  generally  true  for  color  image 


quality  assessment.  As  in  the  case  of  Q,  the  overall  image  quality  MSSIM  is  obtained  by  computing  the  average  of 
SSIM  values  over  all  windows: 

1  M 

MSSIM  =  —  I  SSIM  , 

M  j= 1  J 


In  this  paper,  we  will  compare  three  full-reference  full  color  image  quality  measures: 

•  M-DFT:  A  Full-Reference  Color  Image  Quality  Measure  in  the  DFT  Domain  [13] 

•  M-DWT:  A  Full-Reference  Color  Image  Quality  Measure  in  the  DWT  Domain  [14] 

•  M-DCT:  A  Full-Reference  Color  Image  Quality  Measure  in  the  DCT  Domain  [15] 

YUV  and  RGB  are  two  of  the  commonly  used  color  models  for  images  and  video.  The  model  YUV  is  a  linear 

transformation  between  the  gamma-corrected  RGB  components  to  produce  a  luminance  signal  and  a  pair  of 

chrominance  signals.  The  luminance  signal  conveys  color  brightness  levels,  and  each  chrominance  signal  gives  the 

difference  between  a  color  and  a  reference  white  at  the  same  luminance.  A  common  approach  employed  in  developing  a 
quality  measure  for  color  images  is  to  use  only  the  luminance  signal. 

Assume  the  size  of  a  given  image  is  nxn.  Each  proposed  algorithm  is  as  follows: 

2.  Apply  the  transform  (DFT,  DWT,  or  DCT)  to  the  luminance  layer  of  the  original  image. 

3.  Apply  the  transform  to  the  luminance  layer  of  the  degraded  image. 

4.  Divide  the  transform  coefficients  into  four  bands. 

5.  For  each  band,  perform  the  following  operations: 

a.  Obtain  the  magnitudes  Moh  i-  1 , . . . ,  (nxn/4)  of  original  transform  coefficients. 

b.  Obtain  the  magnitudes  Md„  /—  1 , . . . ,  (nxn/4)  of  degraded  transform  coefficients. 

c.  Compute  the  absolute  value  of  the  differences:  /=!,...»  (nxn/4). 

d.  Compute  the  standard  deviation  of  the  differences. 

6.  Obtain  the  mean  of  four  standard  deviations. 


2.  EXPERIMENTS 

The  three  measures  were  applied  to  four  512x512  full  color  images  (Lena,  Goldhill,  Peppers,  and  Airplane).  The 
images,  shown  in  Figure  1,  have  different  frequencies,  ranging  from  low  frequency  content  (e.g.,  clouds  in  the  Airplane 
image)  to  high  frequency  content  (e.g.,  feathers  in  the  Lena  image) . 


Lena 

Goldhill 

Peppers 

Airplane 

j 

fir 

9m 

Figure  1.  Test  images 


Table  2  shows  the  tools  and  parameters  for  five  degradation  types,  and  five  degradation  levels.  Note  that  all  of  these 
degradations  are  performed  in  the  pixel  domain.  The  25  distorted  images  for  each  test  image  are  shown  in  Figures  3-6. 


For  each  test  image,  high  quality  printouts  of  25  distorted  images  were  subjectively  evaluated  by  approximately  15 
observers.  The  printer  was  a  Hewlett-Packard  printer  with  model  number  “HP  Color  Laserjet  4600dn.”  The  8”x8” 
images  were  printed  on  8-l/2”xl  1”  white  paper  with  the  basis  weight  201b  and  brightness  84. 
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Table  2.  Distortion  types  and  distortion  levels 


Type  \  Level 

Level  1 

Level  2 

Level  3 

Level  4 

Level  5 

JPEG  (XnView) 

20:1 

40:1 

60:1 

80:1 

100:1 

JPEG2000  (XnView) 

20:1 

40:1 

60:1 

80:1 

100:1 

Gaussian  blur  (Photoshop) 

1 

2 

3 

4 

5 

Gaussian  noise  (Photoshop) 

3 

6 

9 

12 

15 

Sharpening  (XnView) 

10 

20 

30 

40 

50 

The  observers  were  chosen  among  the  graduate  students  and  instructors  from  the  Department  of  Computer  and 
Information  Science  at  Brooklyn  College.  About  half  of  the  observers  were  familiar  with  image  processing,  and  the 
others  had  a  general  computer  science  background.  They  were  asked  to  rank  the  images  using  a  50-point  scale  in  two 
ways:  within  a  given  distortion  type  (i.e.,  rating  of  the  5  distorted  images),  and  across  five  distortion  types  (i.e.,  rating 
of  the  5  distorted  images  for  each  distortion  level). 

As  the  proposed  measure  is  not  HVS-based,  no  viewing  distance  was  imposed  on  the  observers  in  the  experiment.  In 
subjective  evaluation  [16],  the  widest  scale  is  [0,10].  In  order  to  give  the  observers  a  wider  scale,  grade  1  was  assigned 
to  the  best  image,  and  grade  50  was  assigned  to  the  worst  image. 

We  will  show  the  overall  performance  of  the  measures  using  two  criteria:  Correlation  Coefficient  (CC)  and  Root  Mean 
Squared  Error  (RMSE)  between  Mean  Opinion  Score  (MOS)  and  objective  prediction.  The  real  success  of  objective 
quality  assessment  can  be  determined  by  predicting  the  quality  not  only  within  a  given  distortion  type  but  also  across 
different  distortion  types. 

We  will  also  compute  two  additional  sets  of  data  in  comparing  the  performance  of  the  measures: 

•  CC  and  RMSE  within  each  of  the  5  distortion  types,  and  Q 

•  CC  and  RMSE  across  each  of  the  5  distortion  levels. 

Finally,  we  will  compare  the  performance  of  the  three  measures  with  PSNR,  and  two  state-of-the-art  metrics,  Q  and 
MSSIM. 

The  main  purpose  of  the  Video  Quality  Experts  Group  (VQEG)  is  to  provide  input  to  the  relevant  standardization  bodies 
responsible  for  producing  international  Recommendations  regarding  the  definition  of  an  objective  Video  Quality  Metric 
(VQM)  in  the  digital  domain. 

In  the  FR-TV  Phase  I  testing  and  validation,  a  nonlinear  mapping  between  the  objective  model  outputs  and  subjective 
quality  ratings  was  used  [17].  The  performance  of  the  9  proponent  models  was  evaluated  after  compensating  for  the 
nonlinearity.  In  our  experiments,  we  followed  the  same  procedure  by  fitting  a  logistic  curve  to  establish  a  nonlinear 
mapping.  The  logistic  function  has  the  form 


logistic(r,x)  =  — 


1 

1  +  exp(zx) 


where  r  is  a  constant  parameter. 


Level/Distortion  type 


Gaussian  blur 


JPEG 


JPEG  2000 


Gaussian  noise  Sharpening 


Figure  4.  Goldhill 


Figure  3.  Lena 


JPEG  2000  Gaussian  noise  Sharpening 


Level/Distortion  type  Gaussian  blur  JPEG 


‘‘  i  4 


-  ,  If 
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JPEG  2000  Gaussian  noise  Sharpening 


Level/Distortion  type  Gaussian  blur 


Figure  5.  Peppers 


Level/Distortion  type  Gaussian  blur  JPEG  JPEG  2000  Gaussian  noise  Sharpening 


Figure  6.  Airplane 


6 


2.1  M-DFT:  A  Full-Reference  Color  Image  Quality  Measure  in  the  DFT  Domain 

Figure  7  shows  the  curves  fitted  for  all  the  four  measures  (PSNR,  Q,  MSSIM,  and  M-DFT)  compared. 


Table  3  displays  the  overall  performance  of  the  measures  using  two  criteria:  Correlation  Coefficient  (CC)  and  Root 
Mean  Squared  Error  (RMSE)  between  MOS  and  objective  prediction. 


Table  3.  Comparison  of  four  measures 


Criteria/Measure 

PSNR 

Q 

MSSIM 

M-DFT 

CC 

0.8038 

0.8482 

0.8282 

0.8800 

RMSE 

7.0857 

6.3096 

6.6749 

5.6579 

The  performance  of  the  measures  within  each  distortion  type  and  across  different  distortion  types  are  given  in  Tables  4 
and  5,  respectively.  We  observe  that  M-DFT  outperforms  all  the  other  measures  not  only  in  overall  performance  but 
also  within  each  distortion  type  and  across  each  distortion  level. 
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Table  4.  (a)  CC-based  performance  within  each  distortion  type 


Distortion  type/Measure 

PSNR 

Q 

MSSIM 

M-DFT 

JPEG 

0.8877 

0.9136 

0.8768 

0.9314 

JPEG2000 

0.7799 

0.7810 

0.8354 

0.9042 

Gaussian  blur 

0.8773 

0.9124 

0.9248 

0.9804 

Gaussian  noise 

0.9947 

0.9585 

0.9766 

0.9950 

Sharpening 

0.9513 

0.9662 

0.9627 

0.9739 

(b)  RMSE-based  performance  within  each  distortion  type 


Distortion  type/Measure 

PSNR 

Q 

MSSIM 

M-DFT 

JPEG 

4.7647 

4.2082 

4.9762 

3.7709 

JPEG2000 

3.5505 

3.5387 

3.1151 

2.4208 

Gaussian  blur 

6.1557 

5.2497 

4.8780 

2.5280 

Gaussian  noise 

0.9181 

2.5556 

1.9291 

0.8941 

Sharpening 

1.0498 

0.8783 

0.9215 

0.7723 

Table  5.  (a)  CC-based  performance  across  each  distortion  level 


Distortion  level/Measure 

PSNR 

Q 

MSSIM 

M-DFT 

1 

0.8024 

0.7783 

0.8436 

0.9726 

2 

0.8246 

0.8405 

0.8542 

0.9426 

3 

0.8361 

0.8408 

0.8402 

0.8710 

4 

0.8422 

0.8601 

0.8549 

0.8969 

5 

0.8358 

0.8818 

0.8755 

0.9057 

(b)  RMSE-based  performance  across  each  distortion  level 


Distortion  level/Measure 

PSNR 

Q 

MSSIM 

M-DFT 

1 

2.1151 

2.2253 

1.9028 

0.8239 

2 

3.3381 

3.1974 

3.0679 

1.9707 

3 

4.6552 

4.5944 

4.6026 

4.1711 

4 

5.6156 

5.3127 

5.4026 

4.6057 

5 

6.7775 

5.8218 

5.9646 

5.2368 

2.2  M-DWT:  A  Full-Reference  Color  Image  Quality  Measure  in  the  DWT  Domain 

Figure  8  shows  the  curves  fitted  for  all  the  four  measures  (PSNR,  Q,  MSS1M,  and  M-DWT)  compared. 
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Figure  8.  Comparison  of  the  scatter  plots  for  PSNR,  Q,  MSSIM,  and  M-DWT.  In  each  plot,  the  y-axis  represents  the  Mean 
Opinion  Score  (MOS),  the  x-axis  represents  the  quantitative  measure,  and  each  mark  represents  one  distorted  image.  The 
mapping  between  the  distortion  types  and  the  marks  is  as  follows:  JPEG  (□),  JPEG  2000  (A),  Gaussian  blur  (o), 
Gaussian  noise  (0),  Sharpening  (x). 

Table  6  displays  the  overall  performance  of  the  measures  using  two  criteria:  Correlation  Coefficient  (CC)  and  Root 
Mean  Squared  Error  (RMSE)  between  MOS  and  objective  prediction. 


Table  6.  Comparison  of  four  measures 


The  performance  of  the  measures  within  each  distortion  type  and  across  different  distortion  types  are  given  in  Tables  7 
and  8,  respectively. 


We  observe  that  the  overall  performance  of  M-DWT  is  better  than  the  other  three  measures.  The  performance  is  not  as 
good  as  the  performance  of  PSNR,  Q,  and  MSSIM  for  three  types  of  distortion.  However,  the  performance  of  M-DWT 
is  considerably  more  consistent  across  distortion  levels,  a  problem  which  is  very  difficult  to  solve. 
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Table  7.  (a)  CC-based  performance  within  each  distortion  type 


Distortion  type\Measure 

PSNR 

Q 

MSSIM 

M-DWT 

JPEG 

0.8877 

0.8768 

0.6123 

JPEG2000 

0.7799 

0.7810 

0.8354 

0.5559 

Gaussian  blur 

0.8773 

0.9124 

0.9248 

0.9336 

Gaussian  noise 

0.9947 

0.9585 

0.9766 

0.8834 

Sharpening 

0.9513 

0.9662 

0.9627 

0.9777 

(b)  RMSE-based  performance  within  each  distortion  type 


Distortion  type\Measure 

PSNR 

Q 

MSSIM 

M-DWT 

JPEG 

4.7647 

4.2082 

4.9762 

8.1811 

JPEG2000 

3.5505 

3.5387 

3.1151 

4.7102 

Gaussian  blur 

6.1557 

5.2497 

4.8780 

4.5941 

Gaussian  noise 

0.9181 

2.5556 

1.9291 

4.2007 

Sharpening 

1.0498 

0.8783 

0.9215 

0.7150 

Table  8.  (a)  CC-based  performance  across  each  distortion  level 


Distortion  level\Measure 

PSNR 

Q 

MSSIM 

M-DWT 

1 

0.8024 

0.7783 

0.8436 

0.8788 

2 

0.8246 

0.8405 

0.8542 

0.9417 

3 

0.8361 

0.8408 

0.8402 

0.9010 

4 

0.8422 

0.8601 

0.8549 

0.8979 

5 

0.8358 

0.8818 

0.8755 

0.8963 

(b)  RMSE-based  performance  across  each  distortion  level 


Distortion  level\Measure 

PSNR 

Q 

MSSIM 

M-DWT 

1 

2.1151 

2.2253 

1.9028 

1.6911 

l 

3.3381 

3.1974 

3.0679 

1.9851 

3 

4.6552 

4.5944 

4.6026 

3.6815 

4 

5.6156 

5.3127 

5.4026 

4.5840 

5 

6.7775 

5.8218 

5.9646 

5.4741 

2.3  1VI-DCT:  A  Full-Reference  Color  Image  Quality  Measure  in  the  DCT  Domain 

Figure  9  shows  the  curves  fitted  for  all  the  four  measures  (PSNR,  Q,  MSS1M,  and  M-DCT)  compared. 


10 


45 

0 

o 

« 

°  O 

40 

Cl 

IS 

c  “  * 

IS 

Q  ,,  o 

W 

o.,  0  0  '\°n 

0  X  p 

IS 

'ft 

a,  Oo  \  o 

IS 

fi  DO  \  o 

30 

,  a  4 

30 

IS 

„  S 
»  A  -\0 

10 

10 

V 

) 

*» 

s 

IS  M  »  40  4$  SO 

0 

PSNR 

Q 

°  0  0 

45 

(I 

10 

IS 

0  °  O 

V  o#a 

40 

35 

o  0  0 

0  % 

30 

3 

3 

*  . . 

*  <30  \  ° 

A*  *  O  Vtf 

%  •  40  3? 

a 

20 

IS 

o  “  /  o  f 

1  ' 

\  . 

X--’ 

t 

10 

0.4  0.5  0.0  0.7  a*  0.0  1 

5  « 

MSSIM 

M-DCT 

Figure  9.  Comparison  of  the  scatter  plots  for  M-DCT,  PSNR,  Q,  and  MSSIM.  In  each  plot,  the  y-axis  represents  the  Mean  Opinion 
Score  (MOS),  the  x-axis  represents  the  quantitative  measure,  and  each  mark  represents  one  distorted  image.  The 
mapping  between  the  distortion  types  and  the  marks  is  as  follows:  JPEG  (□),  JPEG  2000  (A),  Gaussian  blur  (o), 
Gaussian  noise  (0),  Sharpening  (x). 


Table  9  displays  the  overall  performance  of  the  measures  using  two  criteria:  Correlation  Coefficient  (CC)  and  Root 
Mean  Squared  Error  (RMSE)  between  MOS  and  objective  prediction. 


Table  9.  Comparison  of  four  measures 


Criteria\Measure 

PSNR 

Q 

MSSIM 

M-DCT 

CC 

0.8038 

0.8482 

0.8282 

0.8743 

RMSE 

7.0857 

6.3096 

6.6749 

5.7821 

The  performance  of  the  measures  within  each  distortion  type  and  across  different  distortion  types  are  given  in  Tables  10 
and  1 1,  respectively. 

We  observe  that  the  overall  performance  of  M-DCT  is  better  than  the  other  three  measures.  For  only  Gaussian  blur,  the 
performance  of  M-DCT  is  slightly  worse  than  the  performances  of  Q  and  MSSIM.  Across  each  distortion  level,  M- 
DCT  outperforms  all  the  other  measures  in  comparison. 
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Table  10.  (a)  CC-based  performance  within  each  distortion  type 


Distortion  type\Measure 

PSNR 

Q 

MSSIM 

M-DCT 

JPEG 

0.8877 

0.9136 

0.8768 

0.9470 

JPEG2000 

0.7799 

0.7810 

0.8354 

0.9156 

Gaussian  blur 

0.8773 

0.9124 

0.9248 

0.8868 

Gaussian  noise 

0.9947 

0.9585 

0.9766 

0.9955 

Sharpening 

0.9513 

0.9662 

0.9627 

0.9931 

(b)  RMSE-based  performance  within  each  distortion  type 


Distortion  type\Measure 

PSNR 

Q 

MSSIM 

M-DCT 

JPEG 

4.7647 

4.2082 

4.9762 

3.3238 

JPEG2000 

3.5505 

3.5387 

3.1151 

2.2783 

Gaussian  blur 

6.1557 

5.2497 

4.8780 

5.9259 

Gaussian  noise 

0.9181 

2.5556 

1.9291 

0.8485 

Sharpening 

1.0498 

0.8783 

0.9215 

0.3984 

Table  11.  (a)  CC-based  performance  across  each  distortion  level 


Distortion  levelVMeasure 

PSNR 

Q 

MSSIM 

M-DCT 

1 

0.8024 

0.7783 

0.8436 

0.9326 

2 

0.8246 

0.8405 

0.8542 

0.9178 

3 

0.8361 

0.8408 

0.8402 

0.8790 

4 

0.8422 

0.8601 

0.8549 

0.8853 

5 

0.8358 

0.8818 

0.8755 

0.8850 

(b)  RMSE-based  performance  across  each  distortion  level 


Distortion  level\Measure 

PSNR 

Q 

MSSIM 

M-DCT 

1 

2.1151 

2.2253 

1.9028 

1.2793 

.  2 

3.3381 

3.1974 

3.0679 

2.3430 

3 

4.6552 

4.5944 

4.6026 

4.0473 

4 

5.6156 

5.3127 

5.4026 

4.8427 

5 

6.7775 

5.8218 

5.9646 

5.7474 

3.  CONCLUSIONS 

We  presented  three  full-reference  quality  measures  for  color  images.  Each  measure  is  based  on  a  particular  transform, 
(a)  M-DFT  uses  the  Discrete  Fourier  Transform,  (b)  M-DWT  uses  the  Discrete  Wavelet  Transform,  and  (c)  M-DCT 
uses  the  Discrete  Cosine  Transform.  In  all  cases,  the  three  measures,  when  the  overall  performance  is  considered, 
outperform  the  other  measures  (i.e.,  PSNR,  Q,  and  MSSIM).  The  other  experimental  results  can  be  summarized  as 
follows: 

•  The  performance  of  M-DFT  is  better  than  PSNR,  Q,  and  MSSIM  for  each  distortion  type  and  distortion  level. 
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•  The  performance  of  M-DWT  is  not  as  good  as  the  performance  of  PSNR,  Q,  and  MSSIM  for  three  types  of 
distortion  (JPEG  compression,  JPEG  compression,  and  Gaussian  noise).  However,  the  performance  of  M-DWT 
is  considerably  more  consistent  across  distortion  levels,  providing  higher  CCs  and  lower  RMSEs. 

•  The  performance  of  M-DCT  is  slightly  worse  than  the  performances  of  Q  and  MSSIM  for  Gaussian  blur.  Across 
each  distortion  level,  M-DCT  outperforms  all  the  other  measures. 

In  future  work,  we  will  apply  the  proposed  measures  to  watermarked  images  and  video  sequences. 
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ABSTRACT 

A  recent  image  quality  measure,  M-SVD,  can  express  the  quality  of  distorted  images  either  numerically  or  graphically. 
Based  on  the  Singular  Value  Decomposition  (SVD),  it  consistently  measures  the  distortion  across  different  distortion 
types  and  within  a  given  distortion  type  at  different  distortion  levels.  The  SVD  decomposes  every  real  matrix  into  a 
product  of  three  matrices  A  =  USV1 ,  where  U  and  V  are  orthogonal  matrices,  UrU  -  /,  V  V  =  /  and  S  =  diagfsi,  s2,  ...). 
The  diagonal  entries  of  S  are  called  the  singular  values  of  A,  the  columns  of  LI  are  called  the  left  singular  vectors  of  A, 
and  the  columns  of  V  are  called  the  right  singular  vectors  of  A.  M-SVD,  as  a  graphical  measure,  computes  the  distance 
between  the  singular  values  of  the  original  image  block  and  the  singular  values  of  the  distorted  image  block,  where  nxn 
is  the  block  size.  If  the  image  size  is  k  x  k,  we  have  (klri)  x  ( kin )  blocks.  The  set  of  distances,  when  displayed  in  a  graph, 
represents  a  “distortion  map.”  The  numerical  measure  is  derived  from  the  graphical  measure.  It  computes  the  global 
error  expressed  as  a  single  numerical  value.  In  this  paper,  we  will  extend  the  SVD-based  image  quality  measure  to 
evaluate  the  visual  quality  of  watermarked  images  using  several  watermarking  schemes. 

Keywords:  Singular  Value  Decomposition,  image  quality,  watermarking,  Discrete  Wavelet  Transform,  Discrete  Cosine 
Transform,  Discrete  Fourier  Transform,  Quantization  Index  Modulation,  Lapped  Orthogonal  Transform. 

1.  INTRODUCTION 

Measurement  of  image  quality  is  a  challenging  problem  in  many  image  processing  fields  ranging  from  lossy 
compression  to  printing.  The  quality  measures  in  the  literature  can  be  classified  into  two  groups:  Subjective  and 
objective.  Subjective  evaluation  is  cumbersome  as  the  human  observers  can  be  influenced  by  several  critical  factors 
such  as  the  environmental  conditions,  motivation,  and  mood.  The  most  common  objective  evaluation  tool,  the  Mean 
Square  Error  (MSE),  is  very  unreliable,  resulting  in  poor  correlation  with  the  human  visual  system  (HVS)  [1].  In  spite  of 
their  complicated  algorithms,  the  more  recent  HVS-based  objective  measures  do  not  appear  to  be  superior  to  the  simple 
pixel-based  measures  like  the  MSE,  Peak  Signal-to-Noise  Ratio  (PSNR),  or  Root  Mean  Squared  Error  (RMSE).  It  is 
argued  that  an  ideal  image  quality  measure  should  be  able  to  describe  [2]: 

(1)  amount  of  distortion, 

(2)  type  of  distortion,  and 

(3)  distribution  of  error. 

Undoubtedly,  there  is  a  need  for  an  objective  measure  that  provides  more  information  than  a  single  numerical  value. 
Only  a  few  multi-dimensional  measures  exist  in  the  relevant  literature  today  [2]. 

Image  quality  measures  can  be  classified  using  a  number  of  criteria  such  as  the  type  of  domain  (pixel  or  transform),  the 
type  of  distortion  predicted  (noise,  blur,  etc.),  and  the  type  of  information  needed  to  assess  the  quality  (original  image, 
distorted  image,  etc.).  Table  1  gives  a  classification  based  on  these  three  criteria,  and  includes  representative  examples 
of  recently  published  papers.  Measures  that  require  both  the  original  image  and  the  distorted  image  are  called  “full- 
reference”  or  “non-blind”  methods,  measures  that  do  not  require  the  original  image  are  called  “no-reference”  or  “blind” 
methods,  and  measures  that  require  both  the  distorted  image  and  partial  information  about  the  original  image  are  called 
“reduced-reference”  methods. 


*  Corresponding  author’s  email  address:  eskiciogIu@sci.brooklyn.cuny.edu 
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We  have  recently  developed  a  new  measure,  M-SVD,  that  can  express  the  quality  of  distorted  images  either  numerically 
or  graphically  [3,4,5].  Based  on  the  Singular  Value  Decomposition  (SVD),  it  consistently  measures  the  distortion  across 
different  distortion  types  and  within  a  given  distortion  type  at  different  distortion  levels.  Comparison  with  the  state-of- 
the-art  metrics  Q  and  MSSIM  shows  that  the  performance  of  M-SVD  is  superior. 

Every  real  matrix  A  can  be  decomposed  into  a  product  of  3  matrices  A  =  USV1 ,  where  U  and  V  are  orthogonal  matrices, 
UTU  =  I,  VTV  =  /  and  S  =  diagfo,  s2, ...).  The  diagonal  entries  of  5  are  called  the  singular  values  of  A,  the  columns  of  U 
are  called  the  left  singular  vectors  of  A,  and  the  columns  of  V  are  called  the  right  singular  vectors  of  A.  This 
decomposition  is  known  as  the  Singular  Value  Decomposition  of  A.  It  is  one  of  the  most  useful  tools  of  linear  algebra 
with  several  applications  to  multimedia,  including  image  compression  and  watermarking. 


Table  1.  Classification  of  image  quality  measures 


Publ  ication\Criterion 

Domain  type 

Type  of  distortion  predicted 

Type  of  information  needed 

Pixel 

Salt  and  pepper  noise, 
enlightening  and  darkening 

Full-reference 

Beghdadi  and  Pesquet- 
Popescu,  2003  [71 

Discrete  Wavelet 

Transform 

Gaussian  noise,  grid 
pattern,  JPEG  compression 

Full-reference 

Bovik  and  Liu,  2001  [8] 

Discrete  Cosine  Transform 

JPEG  compression 

No-reference 

Discrete  Fourier  Transform 

JPEG  compression 

No-reference 

Pixel 

JPEG  compression 

No-reference 

Marziliano,  Dufaux, 

Winkler  and  Ebrahimi, 

2002  nil 

Pixel 

Gaussian  blur,  JPEG  2000 
compression 

No-reference 

Pixel 

Gaussian  blur,  JPEG  2000 
compressioiO 

No-reference 

Pixel 

JPEG  compression 

No-reference 

Carnec,  Le  Callet  and 

Barba,  2003  T141 

Pixel 

JPEG  compression,  JPEG 
2000  compression 

Reduced-reference 

M-SVD  is  a  bivariate  measure  that  computes  the  distance  between  the  singular  values  of  the  original  image  block  and  the 
singular  values  of  the  distorted  image  block: 

Dj  =  Sqrl[Zf=l(sj  -s.)2]  , 


where  s(-  are  the  singular  values  of  the  original  block,  s;-  are  the  singular  values  of  the  distorted  block,  and  n  x  n  is  the 

block  size.  If  the  image  size  is  k  x  k,  we  have  (k/n)  x  (kin)  blocks.  The  set  of  distances,  when  displayed  in  a  graph, 
represents  a  “distortion  map.” 

The  numerical  measure  is  derived  from  the  graphical  measure.  It  computes  the  global  error  expressed  as  a  single 
numerical  value,  depending  on  the  distortion  type: 


M-SVD  = 


(k  /  n)\(k  /  ri) 


where  D  ^  represents  the  mid  point  of  the  sorted  Dj  s. 
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in  this  paper,  we  will  extend  the  SVD-based  image  quality  measure  to  evaluate  the  visual  quality  of  watermarked 
images.  We  have  developed  three  watermarking  algorithms: 

•  A  non-blind  scheme  using  DWT  and  SVD  [15] 

•  A  non-blind  scheme  using  DWT  [16] 

•  A  semi-blind  scheme  using  DFT  [17] 

We  also  have  access  to  the  benchmarking  suite  [18]  in  the  School  of  Electrical  and  Computer  Engineering  at  Purdue 
University.  On  this  web  site,  there  are  several  watermarking  schemes  including: 

•  A  non-blind  scheme  using  DCT  [19] 

•  A  semi-blind  scheme  using  DWT  [20] 

•  A  blind  scheme  based  on  the  Quantization  Index  Modulation  Algorithm  (QIM)  [21] 

•  A  non-blind  scheme  using  the  Lapped  Orthogonal  Transform  (LOT)  based  adaptive  algorithm  [22] 

We  will  use  M-SVD  to  compare  the  visual  quality  of  watermarked  images  obtained  by  the  above  algorithms. 

The  embedding  and  detection  algorithms  for  the  watermarking  schemes  are  as  follows: 

Reference  15: 

Watermark  embedding 

1 .  Using  DWT,  decompose  the  cover  image  A  into  4  subbands:  LL,  HL,  LH,  and  HH. 

2.  Apply  SVD  to  each  subband  image:  A ^  ,  k  =  1,2, 3, 4,  where  k  denotes  LL,  HL,  LH,  and  HH  bands, 

,  £ 
and  /l,  ,  t-l,...,«  are  the  singular  values  of  E^  . 

3.  Apply  SVD  to  the  visual  watermark:  W  =  U  E  ,  where  A,,;,  i=  1 are  the  singular  values  of  E  . 

WWW  w 

4.  Modify  the  singular  values  of  the  cover  image  in  each  subband  with  the  singular  values  of  the  visual  watermark. 

X*  =  +  akAwi,i  =  1, ...,«,  and  k  =  1,2,3, 4. 

*  k  k  *  k  kT 

5.  Obtain  the  4  sets  of  modified  DWT  coefficients:  A  =  U  E  V  ,  k  =  1,2, 3, 4. 

a  a  a 

6.  Apply  the  inverse  DWT  using  the  4  sets  of  modified  DWT  coefficients  to  produce  the  watermarked  cover  image  A’ . 


Watermark  extraction 


1 .  Using  DWT,  decompose  the  watermarked  (and  possibly  attacked)  cover  image  A  into  4  subbands:  LL,  HL,  LH, 
and  HH. 

2.  Apply  SVD  to  each  subband  image:  A  ^  =  L'^E  ,  k  -  1,2, 3,4,  where  k  denotes  the  attacked  LL,  HL,  LH, 


a  a  a 

ik  ....  /  ^  1 


and  HH  bands. 

3.  Extract  the  singular  values  from  each  subband:  /l*,  =  k  —  A*)/ &k99  i  =  and  k  =  1 ,2,3,4. 

k  k  T 

4.  Construct  the  four  visual  watermarks  using  the  singular  vectors:  W  =  U  £  V  ,  k  =  1,2,3, 4. 

w  w  w 
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Reference  16: 


Watermark  embedding  (first  level  decomposition) 

1.  Using  two-dimensional  separable  dyadic  DWT,  obtain  the  first  level  decomposition  of  the  cover  image  /. 

2.  Modify  the  DWT  coefficients  Vy  in  the  LL,  HL,  LH,  and  HH  bands:  F,V. ■=  V*  +  akWtj,  ij=l,  and 
£=  1,2,3, 4. 

3.  Apply  inverse  DWT  to  obtain  the  watermarked  cover  image  Iw . 

Watermark  extraction  (first  level  decomposition) 

1 .  Using  two-dimensional  separable  dyadic  DWT,  obtain  the  first  level  decomposition  of  the  watermarked  (and 

T* 

possibly  attacked)  cover  image  1 w . 

2.  Extract  the  binary  visual  watermark  from  the  LL,  HL,  LH,  and  HH  bands:  Wy  —  (V wij  ~  Vy  )  /  ak, 
and  £=1,2,3, 4. 

3.  If  W*j  >  0.5  ,  then  Wy  =  1  else  Wy  =  0 
Watermark  embedding  (second  level  decomposition) 

1.  Using  two-dimensional  separable  dyadic  DWT,  obtain  the  second  level  decomposition  of  the  cover  image  /. 

2.  Modify  the  DWT  coefficients  Vy  in  the  LL2,  HL2,  LH2,  and  HH2  bands:  =  Vjj  +  akWy,  ij=\,...,n,  and 

£=1,2,3, 4. 

3.  Apply  inverse  DWT  to  obtain  the  watermarked  cover  image  I w . 

Watermark  extraction  (second  level  decomposition) 

1.  Using  two-dimensional  separable  dyadic  DWT,  obtain  the  second  level  decomposition  of  the  watermarked  (and 

T* 

possibly  attacked)  cover  image  Iw . 

2.  Extract  the  binary  visual  watermark  from  the  LL2,  HL2,  LH2,  and  HH2  bands:  W =  (Vwjj  ~  Kj  )  !  ak  >  ' . n’ 

and  £=1,2, 3, 4. 

3.  If  W‘  >  0.5 ,  then  W*  =  1  else  WtJ  =  0 

«■ 

Reference  17: 

Watermark  embedding 

1 .  Convert  the  NxN  RGB  cover  image  to  YU  V. 

2.  Compute  the  DFT  of  the  luminance  layer. 

3.  Move  the  origin  to  the  center. 

4.  Obtain  the  magnitudes  of  DFT  coefficients. 

5.  Divide  the  NxN  matrix  of  magnitudes  into  four  (N/2)x(N/2)  matrices  M„i,  Mur,  Mu,  M/r.  ul\  upper  left,  ur.  upper 
right,  //:  lower  left,  Ir.  lower  right. 

6.  Define  three  frequency  bands:  low,  middle,  and  high. 

7.  Embed  a  visual  binary  watermark  in  these  three  bands  by  determining  the  embedding  locations. 
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8.  In  each  band,  repeat  the  following  steps  for  all  pairs  of  magnitudes: 

a.  Get  a  pair  of  magnitudes  a  and  b:  a  from  matrix  M</>  and  the  corresponding  magnitude  b  from  matrix  Mur. 

b.  Compute  the  mean  m  =  (a+b)/ 2,  and  choose  the  value  of  the  parameter  p. 

c.  Embedding  bit  1 : 

If  a  <  w-(p/2*m)  then  do  not  modify  a  and  b 
else  a=m-(p/2*m)  and  b=m+(p/2*m) 

d.  Embedding  bit  0: 

If  a  >  m+(p/2*m)  then  do  not  modify  a  and  b 
else  a=m+(p/2*m)  and  b=m-(p/2*m) 

9.  Copy  the  modified  magnitudes  in  matrix  Mu/  to  matrix  Mr- 

10.  Copy  the  modified  magnitudes  in  matrix  Mm  to  matrix  Mi- 

1 1 .  Obtain  the  DFT  coefficients  of  the  entire  image  using  the  modified  magnitudes. 

12.  Compute  the  inverse  DFT  to  obtain  the  luminance  layer  of  the  watermarked  cover  image. 

13.  Convert  the  YUV  watermarked  cover  image  to  RGB. 

Watermark  extraction: 

1.  Convert  the  NxN  watermarked  (and  possibly  attacked)  RGB  cover  image  to  YUV. 

2.  Compute  the  DFT  of  the  luminance  layer. 

3 .  Move  the  origin  to  the  center. 

4.  Obtain  the  magnitudes  of  DFT  coefficients. 

5.  Divide  the  NxN  matrix  of  magnitudes  into  four  (M2)x(M2)  matrices  M„i,  M„n  Mu,  Mr- 

6.  Use  the  three  frequency  bands  and  the  embedding  locations  defined  in  the  embedding  process:  low,  middle,  and 
high. 

7.  For  each  pair  of  magnitudes  in  each  band,  if  a  >  b  then  bit=0  else  bit=l . 

Reference  19: 

Watermark  embedding 

1 .  Compute  the  DCT  of  the  NxN  gray  scale  cover  image  /. 

2.  Embed  a  sequence  of  real  values  X  =  xt,  x2,  ...,  xn,  according  to  7V(0,1),  into  the  n  largest  magnitude  DCT 

coefficients,  excluding  the  DC  component:  v,  =  v,  (l  +  coc,  . 

3.  Compute  the  inverse  DCT  to  obtain  the  watermarked  cover  image 
Watermark  detection 

1 .  Compute  the  DCT  of  the  watermarked  (and  possibly  attacked)  cover  image  I*. 

2.  Extract  the  watermark  X * :  x*  =  (v*  -  v, )  /  av, ,  /  =  1,...,  n. 

*  *  X  •  X * 

3.  Evaluate  the  similarity  of  X  *  and  X  using  sim  ( X ,  X  )=  — ; - -  ,  —  . 

(X  X  )'n 

4.  If  sim  ( X ,  X* )  >  T,  a  given  threshold,  the  watermark  exists. 
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Reference  20: 


Watermark  embedding 

1 .  Compute  the  DWT  of  the  NxN  gray  scale  cover  image  I. 

2.  Exclude  the  low  pass  DWT  coefficients. 

3.  Embed  the  watermark  into  the  DWT  coefficients  I,  >  7\:  l\  -  t,  +  ot|/,|x,,  where  i  runs  over  all  DWT  coefficients  > 
T\. 

4.  Replace  T  =  {/,}  with  T  =  {/’/}  in  the  DWT  domain. 

5.  Compute  the  inverse  DWT  to  obtain  the  watermarked  cover  image 

Watermark  detection 

1.  Compute  the  DWT  of  the  watermarked  (and  possibly  attacked)  cover  image  I*. 

2.  Exclude  the  low  pass  DWT  coefficients. 

3.  Select  all  the  DWT  coefficients  higher  than  T2. 

4  Compute  the  sum  z  =  —  y,t* ,  where  i  runs  over  all  DWT  coefficients  >  T2,  y,  represents  either  the  real 
watermark  or  a  fake  watermark,  t]  represents  the  watermarked  and  possibly  attacked  DCT  coefficients. 

5.  Choose  a  predefined  threshold  Tz  =  -  V 1 1,  \. 

2M  “ 

1=1 

6.  If  z  exceeds  Tz,  the  conclusion  is  the  watermark  is  present. 


Reference  21: 

Watermark  embedding 

The  encoder  embeds  the  watermark  into  the  host  signal:  S(x',m)  =  qA(x  +  d(m))  -  d (m),  where  x  is  the  host  signal,  qA(x) 
is  a  uniform  quantizer  with  quantization  interval  A,  d(m)  is  the  dither  signal,  m= 0,1  is  the  message  bit.  The  dither  signal 
d(0)  and  d(l)  are  constructed  with  the  constraint 

d(l)  =  d(0)  +  A  /2  if  d[k,0]  <  0, 
d(l)  =  d(0)  -  A/2  if  d[k,0]  >=  0. 

This  constraint  ensures  that  the  two  dithered  signals  S(x; 0)  and  5'(x;  1 )  have  the  maximum  distance  from  each  other.  d(0) 
is  the  output  from  a  uniform  pseudo  random  number  generator  with  range  [-A*/2,  A*/2]. 

Watermark  detection 

The  decoder  decodes  the  watermark  message  m  from  the  received  composite  signal  S(x',m)  without  having  access  to  the 
original  host  signal  x.  The  message  m  that  should  be  transmitted  is  the  index  for  the  quantizer  used  for  quantizing  the 
host-signal  vector.  While  retrieving  the  hidden  information,  one  evaluates  a  distance  metric  to  all  quantizers.  The  index 
of  the  quantizer  with  the  smallest  distance  contributes  to  the  message  m. 


Reference  22: 


Watermark  embedding 

1.  The  Lapped  Orthogonal  Transform  (LOT)  divides  the  NxN  cover  image  I  into  overlapping  16x16  blocks,  and  maps 
each  block  into  an  8x8  block  in  the  frequency  domain. 

2.  Introduce  a  perceptual  analysis  module  to  extract  from  each  block  a  feature  known  as  the  Texture  Masking  Energy 
(TME).  The  blocks  are  classified  into  one  of  the  following  four  categories  according  to  the  TME:  texture,  fine- 
texture,  edge,  and  flat  region.  The  edge  blocks  are  further  categorized  according  to  the  direction  of  the  edges.  The 
watermark  embedding  energy  in  each  block  is  adjusted  accordingly  to  adapt  to  the  sensitivities  of  the  HVS. 

3.  The  quantization  matrix  for  the  Ath  block  is  QL(k)=Qs\Mhiociik)xMecIge{k),  where  “x”  denotes  the  element-wise 
multiplication.  Qs  is  the  standard  quantization  matrix  used  in  JPEG,  and  Mhi,xk(k),  Mejge{k)  are  used  to  adjust  the 
quantization  steps  in  Qs  for  the  Ath  block. 

4.  In  each  block,  the  5  most  visually  important  AC  coefficients  bear  the  watermark: 

X'k  (in,k  Jn,k)  =  Xk  (/„,* ,  jn,k  )  +  aQl.  («„,* ,  jn,k  M«)> 

where  X  is  the  LOT  coefficient  of  the  original  image,  X’  is  the  corresponding  watermarked  LOT  coefficient,  w{ri)  is 
the  watermark  element,  and  (/„*,  jnk)  denotes  the  specified  position  to  bear  the  watermark  in  the  Ath  block. 

5.  Compute  the  inverse  LOT  to  obtain  the  watermarked  cover  image  P . 

Watermark  detection 


1.  Extract  the  watermark  using  the  LOT  coefficients  of  the  watermarked  (and  possibly  attacked)  cover  image  /*  and 
the  LOT  coefficients  of  the  original  cover  image  /:  w(n)  =  (X  k  (ink ,  j„k  )-Xk  (i„k ,  j„k ))  !(aQL  (i„k ,  j„k )). 


1. 

2. 


Compare  the  extracted  watermark  W  and  the  original  watermark  W  using  the  formula  sim{W ,  W)  = 

6 

If  the  value  of  sim{W,W)  is  larger  then  a  specified  threshold,  the  watermark  is  present. 


WW 

•Jwlf 


2.  EXPERIMENTS 


The  original  512x512  images  used  with  the  watermarking  algorithms  are  shown  in  F igure  1 . 


Figure  1 .  Original  cover  images 


The  luminance  layer  of  each  full  color  watermarked  image,  the  watermarked  gray  scale  images,  the  corresponding 
distortion  maps,  and  the  values  of  the  numerical  measure  are  given  in  Figure  2. 
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Distortion  map  and  global  error 


Watermarked  image 


Reference  16  (second  level 


Reference  17 


Figure  2.  Watermarked  images,  distortion  maps,  and  numerical  measure  values 
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Distortion  map  and  global  error 


Watermarked  image 


h. 


Reference  22 


Figure  2.  Continued 
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For  the  Quantization  Index  Modulation  (QIM)  algorithm  [21],  the  text  embedded  in  the  color  image  Peppers  is. 

Brooklyn  College 
The  City  University  of  New  York 
2900  Bedford  Avenue 
Brooklyn,  NY  11210 

3.  CONCLUSIONS 

A  recent  measure,  M-SVD,  can  express  the  quality  of  distorted  images  either  numerically  or  graphically.  Based  on  the 
Singular  Value  Decomposition  (SVD),  it  consistently  measures  the  distortion  both  across  different  distortion  types  and 
within  a  given  distortion  type  at  different  distortion  levels.  The  SVD  decomposes  every  real  matrix  into  a  product  of  3 
matrices  A  =  USVT,  where  U  and  V  are  orthogonal  matrices,  UrU  =  I,  VrV  =  I  and  S  =  diag(S|,  s2,  —)•  The  diagonal 
entries  of  S  are  called  the  singular  values  of  A,  the  columns  of  U  are  called  the  left  singular  vectors  of  A,  and  the  columns 
of  V  are  called  the  right  singular  vectors  of  A. 

M-SVD,  as  a  graphical  measure,  computes  the  distance  between  the  singular  values  of  the  original  image  block  and  the 
singular  values  of  the  distorted  image  block,  where  «x«is  the  block  size.  If  the  image  size  is  k  x  k,  we  have  (kin)  x 
(kin)  blocks.  The  set  of  distances,  when  displayed  in  a  graph,  represents  a  “distortion  map.”  The  numerical  measure  is 
derived  from  the  graphical  measure.  It  computes  the  global  error  expressed  as  a  single  numerical  value. 

In  this  paper,  we  extended  the  SVD-based  image  quality  measure  to  evaluate  the  visual  quality  of  watermarked  images 
using  seven  algorithms. 

Even  if  the  hacker  knows  the  algorithm  for  M-SVD,  she  will  not  be  able  to  understand  the  algorithms  for  [17],  [19],  and 
[22]  as  these  algorithms  produce  distortion  maps  that  appear  to  be  random. 

For  [15],  [16],  [20],  and  [21],  the  graphical  measure  indicates  what  has  been  embedded. 

•  For  the  DWT-SVD  domain  algorithm  [1 5],  the  contours  of  Lena  are  apparent. 

•  For  the  DWT  domain  algorithm  [16],  the  embedded  binary  logo  (“BC”)  is  definitely  visible. 

•  For  the  DWT  domain  algorithm  [20],  which  embeds  the  watermark  in  LH,  HL,  and  HH  bands,  the  edges  of  the 
peppers  can  be  noticed  because  these  three  bands  correspond  to  higher  frequency  areas. 

•  For  QIM  [21],  if  the  embedded  text  is  longer,  the  graphical  measure  adds  more  columns  in  the  distortion  map. 

The  numerical  measure  computes  a  global  error  with  values  that  range  from  0.385  to  16.152,  depending  on  what  is 
embedded  in  the  cftver  images.  The  smallest  and  largest  values  are  computed  for  [22]  and  [19],  respectively,  probably 
because  of  the  following  reasons: 

•  In  [22],  an  HVS  model  is  used,  and  only  5  most  visually  important  AC  coefficients  are  modified  in  each  8x8 
block  in  the  LOT  domain. 

•  In  [19],  no  HVS  model  is  used,  and  the  highest  1000  AC  coefficients  are  modified  in  the  DCT  domain. 

In  future  work,  we  will  apply  M-SVD  to  video  sequences  such  as  akiyo,  flowergarden,  and  tennis. 
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ABSTRACT 

Objective  video  quality  measurement  is  a  challenging  problem  in  a  variety  of  video  processing  application  ranging  from 
lossy  compression  to  printing.  An  ideal  video  quality  measure  should  be  able  to  mimic  the  human  observer.  We  present 
a  new  video  quality  measure,  M-SVD,  to  evaluate  distorted  video  sequences  based  on  singular  value  decomposition.  A 
computationally  efficient  approach  is  developed  for  full-reference  (FR)  video  quality  assessment.  This  measure  is  tested 
on  the  Video  Quality  Experts  Group  (VQEG)  phase  I  FR-TV  test  data  set.  Our  experiments  show  the  graphical  measure 
displays  the  amount  of  distortion  as  well  as  the  distribution  of  error  in  all  frames  of  the  video  sequence  while  the 
numerical  measure  has  a  good  correlation  with  perceived  video  quality  outperforms  PSNR  and  other  objective  measures 
by  a  clear  margin. 

Keywords:  video  quality  measure,  subjective  evaluation,  singular  value  decomposition,  M-SVD,  full-reference,  Video 
Quality  Experts  Group(VQEG),  PSNR 


1.  INTRODUCTION 

Objective  image  and  video  quality  measures  play  an  important  role  in  variety  of  image/video  processing  applications 
ranging  from  lossy  compression  to  printing.  Quality  measures  can  be  classified  into  two  state-of-art  category:  Subjective 
and  objective  [1],  Subjective  evaluation  is  cumbersome  as  the  human  observers  can  be  influenced  by  several  critical 
factors  such  as  the  environmental  conditions,  motivation  and  mood.  Object  is  considerably  stable  but  may  not  correlate 
well  with  Human  Visual  System  [2, 3, 4, 5, 6, 7].  Objective  measures  in  the  literature  can  be  classified  into  three  types 
according  to  the  type  of  information  needed  during  quality  assessment:  Measures  that  require  both  the  original  video  and 
the  distorted  video  are  called  “full-reference”  or  “non-blind”  methods  [8,9],  measures  that  do  not  require  the  original 
video  are  called  “no-reference”  or  “blind”  methods  [10,1 1,12,13,14],  and  measures  that  require  both  the  distorted  video 
and  partial  information  about  the  original  video  are  called  “reduced-reference”  methods  [15].  Currently,  the  most 
commonly  used  full-referenced  objective  evaluation  tools,  the  Mean  Square  Error  (MSE)  and  Peak  Signal-to-Noise 
Ratio  (PSNR),  are  vejy  unreliable,  resulting  in  poor  correlation  with  the  HVS.  Many  efforts  have  been  made  to  design 
image/video  quality  assessment  models,  incorporating  perceptual  quality  measures  by  considering  the  characteristics  of 
HVS.  In  spite  of  their  complicated  algorithms,  the  more  recent  HVS-based  objective  measures  do  not  appear  to  be 
superior  to  the  simple  pixel-based  MSE  and  PSNR.  There  is  an  increasing  need  to  develop  an  objective  quality  measure 
that  may  predict  the  perceived  video  quality  automatically.  Moreover,  it  is  argued  that  an  ideal  image/video  quality 
measure  should  be  able  to  describe  the  amount  of  distortion  as  well  as  the  distribution  of  error.  Undoubtedly,  there  is  a 
need  for  an  objective  measure  that  provides  more  information  than  a  single  numerical  value.  Very  few  multi-dimensional 
measures  exist  in  the  relevant  literature  today. 

A  recent  paper  [16]  developed  a  new  measure,  M-SVD,  can  express  the  quality  of  distorted  images  either  numerically  or 
graphically.  Comparison  with  the  state-of-the-art  metrics  UQI  [17]  and  MSSIM  [18]  shows  that  the  performance  of  M- 
SVD  is  superior.  In  this  paper,  we  propose  an  improved  version  of  the  algorithm  and  employed  for  video  quality 
assessment.  The  proposed  algorithm  is  applied  to  the  VQEG  Phase  I  [19,20]  test  dataset  for  FR-TV  video  quality 
assessment.  The  Video  Quality  Experts  Group  (VQEG)  was  formed  to  develop,  validate  and  standardize  new  objective 
measurement  methods  for  video  quality.  The  introduction  to  the  new  video  quality  assessment  measure  is  given  in 
Section  2.  The  measure  is  tested  on  the  VQEG  Phase  I  FR-TV  video  dataset.  Experimental  results  are  presented  in 
Section  3.  Finally,  Section  4  provides  the  brief  conclusion  and  future  discussion. 


2.  QUALITY  MEASURES  USING  M-SVD 

2.1  M  -SVD  for  gray  scale  images 

A  recently  developed  measure,  M-SVD  [16,21,22],  that  can  express  the  quality  of  distorted  images  either  numerically  or 
graphically.  Based  on  the  Singular  Value  Decomposition  (SVD)  [23,24],  it  consistently  measures  the  distortion  both 
across  different  distortion  types  and  within  a  given  distortion  type  at  different  distortion  levels.  Comparison  with  the 
state-of-the-art  metrics  UQI  and  MSS1M  shows  that  the  performance  of  M-SVD  is  superior. 

Every  real  matrix  A  can  be  decomposed  into  a  product  of  3  matrices  A  =  USVT,  where  U  and  V  are  orthogonal  matrices, 
l/U  =  I,  v'  v  =  1  and  S  =  diag(s,,  s2, ...).  The  diagonal  entries  of  S  are  called  the  singular  values  of  A,  the  columns  of  U 
are  called  the  left  singular  vectors  of  A,  and  the  columns  of  V  are  called  the  right  singular  vectors  of  A.  This 
decomposition  is  known  as  the  Singular  Value  Decomposition  of  A.  It  is  one  of  the  most  useful  tools  of  linear  algebra 
with  several  applications  to  multimedia  including  image  compression  and  watermarking. 

The  proposed  graphical  measure  is  a  bivariate  measure  that  computes  the  distance  between  the  singular  values  of  the 
original  image  block  and  the  singular  values  of  the  distorted  image  block: 

D*=S^[If=1(s;.-f.)2]  0) 


where  s,  and  s ,•  are  the  singular  values  of  the  original  block  and  distorted  block,  Z)*  is  the  distance  of  singular  values  of 
the  klh  distorted  block,  and  n  is  the  block  size.  If  the  image  size  is  rxc,  we  have  (r/ri)  x  (c/n)  blocks.  The  set  of  distances, 
when  displayed  in  a  graph,  represents  a  "distortion  map." 

The  numerical  measure  is  derived  from  the  graphical  measure.  It  computes  the  global  error  expressed  as  a  single 
numerical  value  depending  on  the  distortion  map: 


M-SVD  = 


y(k/n)x(k/n)\D  _D  | 
1  =  1  i  mid 
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where  Dmid  represents  the  mid  point  of  the  sorted  D„  rxc  is  the  image  size,  and  n  is  the  block  size. 

The  measure  is  employed  to  512x512  gray-scale  images.  The  distortion  types,  the  distortion  levels  and  the  associated 
parameters  are  listed  in  Table  1.  Fig.  1  shows  the  distorted  lena  size  of  512x512  across  6  different  types  with  5  different 
levels.  Fig.  2  presents  the  distortion  maps,  which  provides  the  amount  of  distortion,  the  type  of  distortion,  and  the 
distribution  of  error,  are  obtained  as  gray-scale  images  by  mapping  the  D,  values  to  the  range  [0,255].  In  their 
experiments,  the  authors  choosed  8  as  block  size.  Although  the  size  of  the  distortion  map  is  64x64,  they  are  enlarged  to 
the  size  of  distorted  image  to  make  pixel  values  more  visible.  The  darker  pixel  values  indicate  small  distances,  and  the 
lighter  pixel  values  indicate  larger  differences.  The  global  error  expressed  as  a  single  numerical  value  (provides  overall 
error  based  on  the  distortion)  as  presented  in  equation  2.  Pearson  correlation  between  the  single  numerical  value  with 
subjective  evaluation  is  computed.  Analysis  shows  the  performance  of  M-SVD  is  much  better  than  PSNR,  UQI  and 
MSSIM.  Table2  and  Table  3  display  the  correlation  coefficients  between  subjective  evaluation  and  M-SVD  in 
comparison  with  other  objective  models  across  different  distortion  types  in  different  distortion  levels. 


Table  1.  Distortion  types  and  levels  applied  to  tested  image. 


Type  \  Level 

Level  1 

Level  2 

Level  3 

Level  4 

Level  5 

JPEG 

10:1 

20:1 

30:1 

40:1 

50:1 

JPEG2000 

10:1 

20:1 

30:1 

40:1 

50:1 

Gaussian  blur 

1 

2 

3 

4 

5 

Gaussian  noise 

3 

6 

9 

12 

15 

Sharpening 

10 

20 

30 

40 

50 

DC-shifting 

4 

8 

12 

16 

20 

Distortion  maps  _  Pistoned  images 


Level  5 


Level  4 


Fig.  2.  The  corresponding  distortion  maps  of  the  distorted  images  in  Figl 


The  distorted  lena  across  6  different  types  in  5  different  levels 
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Table.  2.  Correlation  coefficients  between  subjective  evaluation  and  M-SVD  in  comparison  with  other  objective  models 
across  each  distortion  type. 


Distortion  TypeVMeasure 

PSNR 

UQI 

MSSIM 

M-SVD 

JPEG 

0.974 

0.904 

0.928 

0.977 

JPEG2000 

0.949 

0.688 

0.801 

0.952 

Gaussian  blur 

0.816 

0.917 

0.906 

0.929 

Gaussian  noise 

0.901 

0.984 

0.987 

0.975 

Sharpening 

0.955 

0.908 

0.947 

0.937 

DC-shifting 

0.914 

0.637 

0.643 

0.718 

Table.  3.  Correlation  coefficients  between  subjective  evaluation  and  M-SVD  in  comparison  with  other  objective  models 
across  each  distortion  level. 


Distortion  Type\Measure 

PSNR 

UQI 

MSSIM 

M-SVD 

1 

0.808 

0.744 

0.781 

0.890 

2 

0.751 

0.808 

0.853 

0.954 

3 

0.529 

0.885 

0.910 

0.962 

4 

0.369 

0.914 

0.929 

0.958 

5 

0.439 

0.940 

0.947 

0.924 

2.2  Video  Quality  Assessment  Algorithm 

The  framework  of  video  quality  assessment  system  is  shown  in  Fig.4.  First,  both  the  original  and  processed  videos  are 
subjected  to  pre-processing  procedure.  The  original  and  processes  video  sequences  are  resample  to  4.4.4,  Y,  Cb,  Cr 
format.  A  spatial-temporal-luminance  alignment  is  included  into  the  system  to  normalize  the  input  sequences.  In  the 
second  step,  a  set  of  frames  of  a  video  sequence  are  divided  into  8x8  blocks.  The  block-based  SVD  is  computed  on  the 
luminance  layer  (Y)  [10]  in  both  the  original  frames  and  the  corresponding  processed  frames.  Then,  the  local  error 
measure  as  equation  1  computes  the  distance  between  the  singular  values  of  the  original  frame  block  and  the  singular 
values  of  the  distorted  frame  block.  It  is  well  known  that  human  eyes  are  over-sensitive  to  high  contrast  areas,  especially 
for  edges.  The  edges  are  computed  by  calculating  the  local  gradient  of  the  luminance  signal  (using  a  Sobel  like  spatial 
filtering)  in  the  luminous  layer.  An  edge  function  is  used  to  detect  edges,  which  returns  a  binary  image  containing  1  s 


Original 

video 


Processed 

video 


Quality 

Measure 


Fig.  4.  The  frame  work  of  video  quality  assessment  system 


where  edges  are  found  and  0's  elsewhere.  This  output  binary  image  allows  us  to  assign  an  edge  index  to  each  block 
within  a  frame.  Those  blocks  in  a  frame  correspond  to  object  boundaries  as  edges  receive  higher  edge  index  as  follows: 


and 


Bt  = 


2LS 


1 

rxc 


Ju 


\Bk  if  Bk>r 
(A  if  Bk  <T 


(3) 


(4) 


where  Bk  denotes  the  edge  index  of  k'h  block,  rxc  denotes  the  frame  dimension,  n  is  the  block  size,  and  t  is  a  threshold 
that  determines  the  strength  of  the  edge  index.  The  local  error  measure  with  edge  detection  is  given  by: 


Dk=BkSqrt[Ilf=l(si-s.)2].  (5) 

The  frame-level  error  measure  described  in  equation  2  is  expressed  as  one  single  value  depending  on  local  error 
measures.  This  measure  is  applied  to  Y,  Cb,  Cr  components  respectively  and  combined  to  obtain  the  frame-level  error 
measure  using  a  weighted  summation.  Let  M-SVDj ,  M-SVDjCb  and  M-SVDjCr  be  the  error  measures  of  Y,  Cb,  Cr 
components  of  the  j'h  frame,  respectively.  The  combined  quality  error  index  is: 

M-SVDj=  WyM-SVDjy+  WcbM-SVDjCb+  WCrM-SVDjCr  (6) 

where  the  weights  Wy,  Wa,  and  Wo  are  obtained  experimentally.  In  the  above  formula,  the  contributions  from  the 
luminance  and  two  chrominance  components  are,  respectively,  80%,  10%,  and  10%,  Therefore,  the  luminous  component 
makes  the  major  contribution. 

The  local  error  output  can  be  mapped  to  a  gray  scale  image  as  values  in  the  range  [0,  255].  Therefore,  the  local  error  is 
expressed  as  a  3-dimensional  graph  that  provides  the  amount  of  the  etjpr  as  well  as  its  distribution  in  a  frame.  Such 
graph  results  in  a  detailed  distortion  map  leading  to  higher  correlation  with  subjective  evaluation.  The  frame-level  error 
is  expressed  as  a  single  numerical  value.  A  plot  of  the  error  series  of  all  frames  contained  in  video  sequence  graphically 
describes  the  amount  and  the  distribution  of  error  in  the  distorted  sequence. 

Finally  the  overall  quality  of  the  entire  sequence  is  defined  as: 

M-svn=  I  Xm-svDj  (7) 

n 

where  n  is  the  number  of  frames  in  a  video  sequence.  T,  is  the  weighting  value  assigned  to  jlh  frame  in  a  video  sequence. 
This  leads  to  a  quality  measure  that  is  equal  to  the  weighted  average  M-SVD  error  measure  of  all  frames.  Error 
adjustment  is  incorporated  throughout  the  all  steps  in  the  video  quality  assessment  system.  Since  the  HVS  is  a 
complicated  system,  error  adjustment  is  attempting  to  improve  the  performance  of  the  quality  assessment  system  by 
incorporating  perceived  quality  features.  In  our  proposed  system,  error  adjustment  is  expressed  in  the  edge  index  and 
weighting  factors  applied  to  both  local  and  global  error  assessment  as  well  as  the  sequence  calibration  in  the  pre¬ 
processing  step. 


3.  EXPERIMENTAL  RESULTS 

The  proposed  algorithm  is  applied  to  the  VQEG  Phase  I  test  dataset  for  FR-TV  video  quality  assessment.  The  Video 
Quality  Experts  Group  (VQEG)  was  formed  to  develop,  validate  and  standardize  new  objective  measurement  methods 
for  video  quality.  They  collected  a  large  set  of  subjective-rated  video  data  where  over  26,000  subjective  opinion  scores 
were  generated  based  on  20  different  source  sequences  processed  by  1 6  different  video  systems  and  evaluated  at  eight 
independent  laboratories  worldwide.  VQEG  provides  this  valuable  dataset,  which  we  can  utilize  to  evaluate  and  improve 
object  video  quality  measurement. 


3.1  Graphical  measure 

The  original  and  processed  video  sequences  are  resampled  to  4:4:4,  Y,  Cb,  Cr  format.  A  spatial-temporal-luminance 
alignment  is  applied  in  the  system  to  normalize  the  input  sequences.  In  our  experiment,  all  frames  of  the  input  video 
sequence  are  divided  into  8x8  blocks.  We  computed  SVD  in  all  blocks  of  the  luminance  and  color  components 
respectively  in  both  the  original  frames  and  the  corresponding  processed  frames.  The  local  error  measure  specified  in 
equation  1  computes  the  distance  between  the  singular  values  of  the  original  frame  block  and  the  singular  values  of  the 
distorted  frame  block.  The  local  error  output  can  be  mapped  to  a  gray  scale  image  as  values  in  the  range  [0,  255],  Fig  5 
shows  the  distortion  maps  as  2-dimensional  and  3-dimensional  graphs  that  provides  the  amount  of  the  error  as  well  as  its 


(c)  (4) 


Fig.  5.  The  distortion  maps  as  a  2  and  3-dimensional  graphs  for  one  frame  in  luminous  layer,  (a)  original  frame  size  of 
568x680  (b)  processed  frame  size  of  568x680  (c)  2-dimensional  distortion  map  size  of  71x85  (d)  3-dimensional 
distortion  map. 


Fig.  6.  The  binary  image  on  the  right  as  the  output  of  the  edge  function  of  one  frame  on  the  left. 


Fig.  7.  The  error  series  of  all  frames  contained  in  one  distorted  video  sequence. 


distribution  in  a  frame.  It  is  known  that  naked  eyes  are  highly  alert  to  the  distortion  on  edges.  An  edge  function  is 
applied  by  calculating  the  local  gradient  of  the  luminance  signal  (using  a  Sobel  like  spatial  filtering)  in  the  luminous 
layer.  Fig  6  displays  the  binary  image  as  the  output  of  the  edge  function  applied  one  frame  in  a  video  sequence.  Each 
block  inside  a  frame  is  received  an  edge  index  as  specified  in  equations  3,  4  and  5.  The  frame-level  error  measure  is 
computed  as  equation  2  for  Y,  Cb  and  Cr  components  respectively  depending  upon  local  error  measures  adjusted  with 
edge  index.  The  outputs  are  combined  to  obtain  the  overall  frame-level  error  using  a  weighted  summation.  Fig  7  shows 
the  graphical  error  measure  of  one  distorted  video  sequence.  In  the  graph,  the  horizontal  axis  is  the  frame  index  of  one 
video  sequence;  the  vertical  axis  is  M-SVD  error  measure  output  of  the  frames  in  such  sequence.  Therefore,  it  provides 
the  amount  and  distribution  of  error  through  the  distorted  sequence. 

3.2  Nu  merical  measure 

The  overall  quality  of  video  sequence  is  expressed  as  one  single  numerical  value  using  equation  7.  We  follow  the 
procedures  employed  in  the  VQEG  Phase  I  objective  video  quality  model  test  plan  to  evaluate  the  performance  of  our 
system.  Four  metrics  are  used  in  the  evaluation  of  objective  results.  Metric  1  is  the  correlation  coefficient  between 
objective  and  subjective  scores  after  variance-weighted  regression  analysis,  including  a  test  of  significance  of  the 
difference.  Metric  2  is  the  correlation  coefficient  between  objective  and  subjective  scores  after  non-linear  regression 
analysis.  A  logistic  function  used  in  the  first  two  metrics  provides  a  non-linear  map  between  subjective  and  objective 
scores,  where  a  weighted  least  squares  procedure  was  applied  to  the  logistic  function.  The  first  two  metrics  predict  the 
accuracy  of  an  objective  model.  Metric  3  is  the  Spearman  rank-order  correlation  coefficient  between  the  objective  and 
subjective  scores.  This  correlation  method  only  assumes  a  monotonic  relationship  between  the  two  scores.  Metric  3  is 
related  to  prediction  monotonicity  of  a  model.  Metric  4  is  outlier  ratio  of  "outlier-points"  to  total  number  of  points.  The 
model's  prediction  consistency  can  be  measured  by  the  number  of  outlier  points  (defined  as  having  an  error  greater  than 
some  threshold  as  a  fraction  of  the  total  number  of  points).  A  smaller  outlier  fraction  means  the  model's  predictions  are 
more  consistent. 

Table  I  presents  the  performance  comparison  of  video  quality  assessment  models  on  VQEG  Phase  I  Test  Data  Set  (all 
test  video  sequences  included).  P1-P9  [19]  are  nine  different  proponent  models  submitted  to  VQEG  for  evaluation.  PO 
(PSNR)  is  included  by  VQEG  as  a  reference  objective  model.  SSIM  (Structural  Similarity  Index)  [25]  is  a  new  measure 
presented  in  a  recent  paper.  Model  M-SVD/Y  computes  the  difference  between  the  original  and  the  distorted  frames  on 
the  luminance  (Y)  component  using  equations  1,  2,  and  7.  Model  M-SVD/Y CbCr  is  applied  to  Y,  Cb,  Cr  color 
components  independently.  The  frame  level  error  measures  for  each  component  are  combined  using  equation  6.  Model 
M-SVD/Edge  Detection  is  applied  to  Y,  Cb  and  Cr  with  edge  detection  performed  on  Y  component  using  equations  2,  5, 
6  and  7.  Table  I  displays  the  comparison  results  of  the  four  metrics  for  different  models.  It  can  be  observed  that  the 
proposed  M-SVD  measure  outperforms  all  other  measures  by  a  clear  margin.  Figure  7  gives  the  non-linear  regression 
analysis  of  the  subjective/objective  scores  on  all  video  sequences  in  the  VQEG  Phase  I  test  given  by  PSNR,  M-SVD/Y, 
M-SVD/YCbCr  and  M-SVD/Edge  Detection.  Each  graph  shows  the  scatter  plot  of  subjective  and  objective  scores  and 
the  fitted  curve.  160  video  sequences  are  tested  for  each  objective  model.  In  each  graph  associated  with  one  model. 
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Table.  2.  Performance  comparison  of  video  quality  assessment  models  on  VQEG  Phase  I  Test  Data  Set  (all  test  video 
sequences  included). 


PSNR  M-SVD/Y  Component 


Fig.  7.  The  Scatter  plot  comparison  of  objective  models  on  all  video  sequences  in  the  VQEG  Phase  I  test  dataset  given  by  PSNR,  M-SVD/Y 
Component,  M-SVD/YCbCr  and  M-SVD/Edge  Detection 


every  point  represents  one  of  160  test  video  sequences.  The  vertical  axis  indicates  the  subject  measurement  denoted  by 
DMOS  while  the  horizontal  axis  is  the  objective  measure  output. 

4.  CONCLUSION  AND  DISCUSSION 

We  presented  a  new  objective  video  quality  assessment  system  based  on  singular  value  decomposition.  The  graphical 
measure  displays  the  amount  of  distortion  as  well  as  the  distribution  of  error  in  all  the  frames  of  the  tested  video 
sequence.  The  numerical  measure  expressed  as  one  single  value  is  well  correlated  with  subjective  evaluation. 
Experiments  on  VQEG  Phase  I  test  imply  the  performance  of  M-SVD  is  considerably  better  than  PSNR  and  other 
proponent  models.  In  particular,  the  correlation  of  M-SVD  model  is  improved  by  approximately  10%  over  PSNR  in  the 
first  two  metrics,  and  outperforms  the  SSIM  by  a  clear  margin  as  well. 

It  should  be  noted  that  we  tested  three  models  to  obtain  the  correlation  with  subjective  evaluation.  In  the  first  model, 
namely  M-SVD/Y,  only  the  luminance  Y  component  was  used  in  the  computations.  Experiments  show  that  simply 
computing  the  difference  of  the  original  and  the  distorted  frames  on  the  Y  component  using  equations  1,  5  and  7 
provides  reasonably  good  results  comparable  to  other  approaches.  In  the  second  model,  denoted  by  M-SVD/YCbCr,  the 
frame  level  quality  error  measure  is  a  weighted  summation  of  quality  error  indexes  of  Y,  Cb  and  Cr  components,  each 
making  a  contribution  with  a  weighting  factor  0.8,  0.1  and  0.1,  respectively.  Equations  1,  5,  6  and  7  are  employed  in  the 
second  model  whose  performance  is  slightly  better  than  the  first  model  in  the  evaluation  of  metrics  2,  3  and  4.  This 
implies  the  Human  Visual  System  (HVS)  is  much  more  sensitive  to  the  sharpness  of  the  luminance  component  than 
those  of  the  chrominance  components.  However,  the  chrominance  components  also  contribute  to  the  perceived  quality 
error  measurement.  In  the  third  model,  namely  M-SVD/Edge  Detection,  the  difference  of  the  edges  of  the  luminance 
component  is  broadened  with  the  respect  to  the  characteristics  of  HVS.  Model  3  has  a  better  correlation  with  the 
perceived  video  quality  in  comparison  with  the  first  and  the  second  models. 

Our  system  is  based  on  singular  value  decomposition  (SVD).  The  computation  of  SVD  is  of  order  0(n3),  which  makes 
the  computations  slower  for  larger  frame  sizes.  If  the  frame  is  segmented  into  smaller  blocks  as  we  use  8x8  for  the  block 
size,  and  the  SVD  is  applied  to  each  block,  the  total  processing  time  is  much  lower.  We  also  noticed  randomly  selected 
sample  blocks  in  sample  frames  rather  than  using  all  blocks  in  frames  of  a  video  sequence  would  reduce  the  computation 
cost  significantly,  while  still  maintaining  reasonably  good  results. 

It  should  be  mentioned  that  we  found  that  M-SVD  has  relatively  poor  correlation  with  perceived  video  quality  in  such 
sequences  of  large  global  motions.  This  may  be  due  to  the  designed  system  that  does  not  incorporate  motion  information 
as  one  aspect  of  evaluation.  In  our  experiments,  we  applied  an  approach  to  detect  the  occurrence  of  big  motions  and 
tried  to  improve  the  error  measure  in  the  sequences  with  motion.  The  issues  relating  to  global  motion  are  still  under 
investigation. 
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